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Abstract 

The stability of low-rank matrix reconstruction is investigated in this paper. The ^^-constrained 
minimal singular value (£,-CMSV) of the measurement operator is shown to determine the recovery 
performance of nuclear norm minimization based algorithms. Compared with the stability results using 
the matrix restricted isometry constant, the performance bounds established using £*-CMSV are more 
concise and tight, and their derivations are less complex. The computationally amenable £*-CMSV and 
its associated error bounds also have more transparent relationships with the Signal-to-Noise Ratio. 
Several random measurement ensembles are shown to have £*-CMSVs bounded away from zero with 
high probability, as long as the number of measurements is relatively large. 

Index Terms 

£* -constrained minimal singular value, matrix Basis Pursuit, matrix Dantzig selector, matrix LASSO 
estimator, restricted isometry property 
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I. Introduction 

The last decade witnessed the burgeoning of exploiting low dimensional structures in signal 
processing, most notably the sparseness for vectors [1], [2], low-rankness for matrices [3]-[5], 
and low-dimensional manifold structure for general non-linear data set [6], [7]. This paper focuses 
on the stability problem of low-rank matrix reconstruction. Suppose X E ]R"iX"2 ^ matrix of 
rank r <^ min{ni,n2}, the low -rank matrix reconstruction problem aims to recover matrix X 
from a set of linear measurements y corrupted by noise w: 

y = AiX) + w, (1) 

where A : M"i^"2 _i. j^m ^ linear measurement operator. Since the matrix X lies in a low- 
dimensional sub-manifold of M"!^"^^ \ye expect m ^ nin2 measurements would suffice to 
reconstruct X from y by exploiting the signal structure. Application areas of model (1) include 
factor analysis, linear system realization [8], [9], matrix completion [10], [11], quantum state 
tomography [12], face recognition [13], [14], Euclidean embedding [15], to name a few (See 
[3]-[5] for discussions and references therein). 

Several considerations motivate the study of the stability of low-rank matrix reconstruction. 
First, in practical problems the linear measurement operator A is usually used repeatedly to col- 
lect measurement vectors y for different matrices X. Therefore, before taking the measurements, 
it is desirable to know the goodness of the measurement operator A as far as reconstructing X 
is concerned. Second, a stability analysis would offer means to quantify the confidence on the 
reconstructed matrix X, especially when there is no other ways to justify the correctness of the 
reconstructed signal. In addition, as in the case of sparse signal reconstruction [16], in certain 
applications we have the freedom to design the measurement operator A by selecting the best 
one from a collection of operators, which requires a precise quantification of the goodness of any 
given operator. All these considerations suggest that the stability measure should be computable, 
an aspect usually overlooked in literature. 

This work is in parallel with our previous work on the stability study of sparse signal 
reconstruction. In [16], we demonstrated that the £i -constrained minimal singular value 
CMSV) of a measurement matrix quantifies the stability of sparse signal reconstruction. Several 
important random measurement ensembles are shown to have £i-CMSVs bounded away from 
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zero with reasonable number of measurements. More importantly, we designed several algorithms 
to compute the £i-CMSV of any given measurement matrix. 

In the current work, we define the -constrained minimal singular value (£*-CMSV) of a linear 
operator to measure the stability of low-rank matrix reconstruction. A large class of random linear 
operators are also shown to have ^,,-CMSVs bounded away from zero. The analysis for random 
linear operators acting on matrix space is more challenging. We need to employ advanced tools 
from geometrical functional analysis and empirical processes. The computational aspect of 
CMSV is left to future work. 

Several works in literature also address the problem of low-rank matrix reconstruction. Recht 
et.al. study the recovery of X in model (1) in the noiseless setting [3]. The matrix restricted 
isometry property (mRIP) is shown to guarantee exact recovery of X subject to the measurement 
constraint A{X) = y. Candes et.al. consider the noisy problem and analyze the reconstruction 
performance of several convex relaxation algorithms [5]. The techniques used in this paper for 
deriving the error bounds in terms of £=k-CMSV draw ideas from [5]. Our bounds are more concise 
and are expected to be tighter. In both works [3] and [5], several important random measurement 
ensembles are shown to have the matrix restricted isometry constant (mRIC) bounded away from 
zero for reasonably large m. Our procedures for establishing the parallel results for the £*-CMSV 
are significantly different from those in [3] and [5]. By analogy to the £i-CMSV, we expect that 
the ^*-CMSV is computationally more amenable than the mRIC [16]. 

The ^*-CMSV has several advantages over the mRIC in stability analysis of low-rank matrix 
analysis. First, the error bounds involving £^.-CMSV have more transparent relationships with the 
Sign-to-Noise-Ratio. For example, consider the matrix Basis Pursuit algorithm, if we multiply the 
measurement operator ^ by a positive constant, the £*-CMSV will scale by the same constant 
and the error bound for the matrix Basis Pursuit will scale inverse proportionally, while the 
mRIC and associated error bounds have more complex scaling properties. Second, the £*-CMSV 
are computationally more amenable than the mRIC. The discrete nature of mRIC makes its 
computation algorithm design very difficult. In contrast, many tools at our disposal can deal with 
the continuous formulation of £*-CMSV, for example, the Lagrange multiplier or the Karush- 
Kuhn-Tucker condition [17]. In particular, many algorithms for computing the smallest matrix 
singular values such as the conjugate gradient method for Rayleigh quotient minimization [18], 
[19] might be adapted to the computation of £*-CMSV by incorporation one additional constraint. 
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We will investigate these possibilities in foUowup works. In addition, the derivation of the 
CMSV bounds is less complicated and the resulting bounds have more concise forms. These 
bounds are also expected to be tighter than those in terms of mRIC. 

The paper is organized as follows. Section II previews our main results. Section III introduces 
notations, the measurement models, three convex relaxation based recovery algorithms, and the 
definition and properties of mRIC. Section IV is devoted to deriving error bounds in terms of 
the ^*-CMSV for three convex relaxation algorithms. In Section V, we analyze the £*-CMSV 
for isotropic and subgaussian measurement operators. The paper is concluded in Section VI. 

II. Overview of the Main Results 

In this section, we preview the main results of this paper. All proofs are postponed to latter 
sections. Throughout the paper, we will assume rii < n2 such that min{ni,n2} = rii and 
max{ni, 77-2} = ^2. 

A. i^- Constrained Singular Values 

We first introduce a quantity that continuously extends the concept of rank for a given matrix 
X. It is also an extension of the ^i-sparsity level from vectors to matrices [16]. 

Definition 1 The l^-rank of a non-zero matrix X G M"i^ "2 defined as 



where \\-\\* is the nuclear norm of a matrix and || ■ ||f the Frobenius norm. We use cr(X) G M"^ 
to denote the vector of singular values of X in decreasing order 

The scaling invariant t{X) is indeed a measure of rank. To see this, suppose rank(X) = r; 
then Cauchy-Schwarz inequality implies that 



and we have equality if and only if all non-zero singular values of X are equal. Therefore, the 
more non-zero singular values X has and the more evenly the magnitudes of these non-zero 
singular values are distributed, the larger t{X). In particular, if X is of rank 1, then r(X) = 1; 
if X is of full rank rii with all singular values having the same magnitudes, then t{X) = rii. 



r(X) 




<T{X)\\l 



2 ' 



(2) 



r(X) < r, 



(3) 
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However, if X has rii non-zero singular values but their magnitudes are spread in a wide range, 
then its i^-rank might be very small. 

Definition 2 For any r G and any linear operator A : M"i^"2 |_s. j^m^ define the £*- 

constrained minimal singular value (abbreviated as i^:-CMSV) and the l^-constrained maximal 
singular value of A by 

pT{A) inf and (4) 

x^o, t{x)<t ||X||f 

,r^A) s sup M^, (5) 

X^O, t{X)<t ||f 

respectively. Because we mainly use p™™ in this paper, for notational simplicity, we sometimes 
use pr to denote p^™ when it causes no confusion. 

For an operator A, a non-zero Pt{A) = p™™(^) roughly means that A is invertible when 
restricted onto the set {X G M"i^"2 . r(X) < r}, or equivalently, the intersection of the null 
space of A and {X G R"i><"2 . ^(x) < r} contains only the null vector of ]R"i^"2 The value 
of PriA) measures the invertibility of A restricted onto {t{X) < r}. As we will see in Section 
IV, the error matrices for convex relaxation algorithms have small -ranks. Therefore, the error 
matrix is distinguishable from the zero matrix given the image of the error matrix under A. Put 
it another way, given noise corrupted A{X), a signal matrix X is distinguishable from X + H, 
as long as the noise works in a way such that the error matrix H has a small £*-rank. This 
explains roughly why PriA) determines the performance of convex relaxation algorithms. 

We begin to define a class of important random operator ensembles: the isotropic and sub- 
gaussian ensemble, after introducing some prerequisite concepts. For a scalar random variable 
X, the Orlicz ip2 norm is defined as 

=inf |t>0:Eexp(^^^ <2|. (6) 

Markov's inequality immediately gives that x with finite ||x||^2 has subgaussian tail: 

P(|a;| >t) <2exp(-ctV||x||^J. (7) 

The converse is also true, i.e., if x has subgaussian tail exp(— t^/i^^), then ||x||^2 < cK. 

Definition 3 A random vector aj G M" is called isotropic and subgaussian with constant L if 

E| (x, u) p = 111*112 and || (aj, u) \\^„^ < L\\u\\2 hold for any u G M". 
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A random vector x with independent subgaussian entries is a subgaussian vector 

because [20] 



Clearly, if in addition {xi}i<n are centered and has unit variance, then x is also isotropic. In 
particular, the standard Gaussian vector on and the sign vector with i.i.d. 1/2 Bernoulli 
entries are isotropic and subgaussian. Isotropic and subgaussian random vectors also include the 
vectors with the normalized volume measure on various convex symmetric bodies , for example, 
the unit balls of for 2 < p < oo [21]. 

Clearly, any linear operator A : M"i^"2 _^ j^m represented by a collection of matrices 

£/ = {Ai, . . . , Am} (Refer to Section III-A for more details). Based on this representation of 
A, we have the following definition of isotropic and subgaussian operators: 

Definition 4 Suppose A : ]R"i^'*2 _^ jj^m ^-^ ^ linear operator with corresponding matrix 
representation . We say A is from the isotropic and subgaussian ensemble if for each A.^ E 
vec(/lj) is an independent isotropic and subgaussian vector with constant L, and L is a numerical 
constant independent of rii , n2. 

Isotropic and subgaussian operators include operators with i.i.d centered subgaussian entries of 
unit variance (Gaussian and Bernoulli entries in particular) as well as operators whose matrices 
Ai (vec(Aj), more precisely) are independent copies of random vectors distributed according to 
the normalized volume measure of unit balls of fp^"^ for 2 < p < oo. 

For any isotropic and subgaussian operator A the typical values of p™'^{A/ ^/m) and p™'^^{A/ y/m) 
concentrate around 1 for relatively large m (but <^ nin2). More precisely, we have the following 
theorem: 

Theorem 1 Let A be an isotropic and subgaussian operator with some numerical constant L. 
Then there exists absolute constants ci, C2, C3 depending on L only such that for any e > and 
m > 1 satisfying 




< 011^112 max IIxjII^j. 



(8) 



m > Ci 



Tn2 



(9) 
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we have 



1 - C2e < Ep'^- 



A 



A 



< 1 + cae 



(10) 



and 



P 



A 



Im 



A_ 



< 1 



> 1 — exp(— cse^m). 



(11) 



Using the relation between the -constrained singular values and the mRIC, we have the 
following immediate corollary which was also established in [5] using different approaches : 

Corollary 1 Under the conditions of Theorem 1, there exists numerical constants Ci,C2 such 
that for any e > the mRIC constant 5r{A/^/m) satisfies 



P [5riA/^/m) > e] < exp(-cie^m) 



as long as 



rn2 

m > C2^^. 



(12) 



(13) 



Corollary 1 was established in [5] using an e— net argument. The same procedure can not be 
generalized trivially to prove Theorem 1 . The idea behind an e— net argument is that any point 
in the set under consideration can be approximated using a point in its e— cover with an error at 
most e. One key ingredient in the proof given by [5] is that the approximation error matrix also 
has low rank. This is not satisfied by the error matrix in our case, because the difference between 
two matrices with small -ranks does not necessarily have a small £^,-rank. This difficulty might 
be circumvented by resorting to the Dudley's inequality [22]. However, a good estimate of the 
covering number of the set {X e M«iX"2 . ||x||p = 1, ||X||^ < r} that makes the Dudley's 
inequality tight enough is not readily available. 

In Section V-A, we start with the Gaussian ensemble. We use the comparison theorems for 
Gaussian processes, the Gordon's inequality and the Slepian's inequality in particular, to show 
that the expected values of the -constrained singular values fall within the neighborhood of 
one. Then the concentration of measure phenomena in Gauss space immediately yields the 
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desired results of Theorem 1 for the Gaussian ensemble as both p^^^{-) and p™^''(-) are Lipschitz 
functions. The Gordon's and Slepian's inequalities rely heavily on Gaussian processes' highly 
symmetric property, which is not satisfied by general subgaussian processes. 

The corresponding results for subgaussian operators are established in Section V-B. The 
problem is formulated as one of empirical processes. We then employ a recent general result of 
empirical processes established in [23]. The challenges of deriving Theorem 1 using more direct 
approaches such as Dudley's inequality and/or the general generic chaining bound [24] is also 
discussed. 

One common ground of the proofs for the Gaussian case and the more general subgaussian 
case reveals the reason that A is invertible on {r(X) < r} while it is far from invertible 
on ]R"i^"2 Both proofs rely on the fact that the canonical Gaussian process indexed by the 
intersection of {t{X) < r} and the unit sphere of (]R"i^"2^ || ■ ||f) can not go too far from zero 
in its life. This essentially means that the set {t{X) < r} f]{\\X\\F = 1} with a small r is 
significantly smaller than the unit sphere itself, on which the canonical Gaussian process would 
drift far away from zero. Refer to Section V for more precise meanings of these discussions. 

B. Stability of Convex Relaxation Algorithms 

In this section, we present the stability results for three convex relaxation algorithms: the matrix 
Basis Pursuit, the matrix Dantzig Selector, and the matrix LASSO estimator. These algorithms 
are reviewed briefly in Section III-C. As one will see in the proofs to Theorems 2, 3 and 4, the 
procedure of establishing these theorems has two steps: 

1) Show that the error matrix H = X — X has a small ^^,-rank: t(H) < r for some suitably 
selected r, which automatically leads to a lower bound ||^(i/)||2 > Prll-f^llF- Here X is 
the true matrix and X is its estimate given by convex relaxation algorithms. 

2) Obtain an upper bound on ||^(iJ)||2. 

These are all relatively easy to show for the matrix Basis Pursuit algorithm. We have the following 
stability result: 

Theorem 2 If matrix X has rank r and the noise w is bounded; that is, \\w\\2 < e, then the 
solution X to the matrix Basis Pursuit (27) obeys 

||X-X||f < — . (14) 

PSr 
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The corresponding bound (36) using mRIC is expressed as ' ^ ^^^^er the condition 

54r < - 1- Here 5,. is the mRIC defined in Definition 5. We note the ^*-CMSV bound (14) 
is more concise and only requires pg,. > 0. Of course, we pay a price by replacing the subscript 
4r with 8r. A similar phenomena is also observed in the sparse signal reconstructions case [16]. 
As suggested by the numerical simulation in [16], by analogy we expect that it is easier to get 
Psr > than S^r < — 1- However, we did not run simulations in this paper because it is not 
clear how to compute within reasonable time even for small scale problems. 

Before stating the results for the matrix Dantzig Selector and the matrix Lasso estimator, we 
cite a lemma of [5]: 

Lemma 1 [5, Lemma 1.1] Suppose w ~ Af{0,aH^). If C > 4^(1 + p'j^^^(^)) log 12, then 
there exists a numerical constant c > such that with probability greater than 1 — 2exp(— 0^2) 
that 

<Cy^a, (15) 

where A* is the adjoint operator of A. 

Lemma 1 allows to transform statements under the condition of ||^*(tu)||2 < A, e.g. Theorem 
3 and 4, into ones that hold with large probability. We now present the error bounds for the 
matrix Dantzig Selector and the matrix LASSO estimator, whose proofs can be found in Sections 
IV-B and IV-C, respectively. 

Theorem 3 Suppose the noise vector in model (1) satisfies \\A*{w)\\2 < X, and suppose X G 
]]^nixn2 of rank r. Then, the solution X to the matrix Dantzig Selector (28) satisfies 

\\X - X\\f < ■ ■ \. (16) 

Theorem 4 Suppose the noise vector in model (1) satisfies ||^*(it>)||2 < Kjifor some k G (0, 1), 
and suppose X G ]R"i^"2 of rank r. Then, the solution X to the matrix LASSO estimator (29) 
satisfies 

1 + K 2v/2 ^ 
\\X-X\\f<- J^.^.^. (17) 

I - K 8r 



(I-k) 



7 
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For example, if we take k = 1 — 2a/2/3, then the bound becomes 

||X - Xy < 6(1 - ^) ■ — ■ ■ /i. (18) 

6 pgr 

The readers are encouraged to compare the statements of Theorem 3 and 4 with those using 
mRIC as cited in Section III-D (Equations (37), (38) and the conditions for them to be valid). 

in. Notations, Measurement Model, Reconstruction Algorithms, and Matrix 

Restricted Isometry Constant 

A. Notations 

We use i'^ to denote the space equipped with the £™ norm || ■ ||p defined as 

i/p 



I lip 



\xk\^ for 1 < p < cx) (19) 



^k<m 



and 



(20) 

k<m 

for 

jT ^ -pj^g notation ||a:;||o counts the number of nonzero elements of x. 
Suppose X = [xi X2 ... Xn^] G M"!^"^ Define the Frobenius norm of X as ||X||f = 



= \/^~of(X), the nuclear norm as = ai{X), and the operator norm as 

||X||2 = max{(Ti(X)}, where cri(X) is the ith singular value of X. The rank of X is denoted 
by rank(X) = #{z : a,{X) ^ 0}. 

If we use (t{X) = [(Ji{X) (T2(X) . . . cr„j(X)]^ to represent the singular value vector, then 
clearly we have the following relations: 

II^IIf = Mx)h, 
\\x\U = IkWiii, 
\\x\\2 = IkWiu, 

rank(X) = \\(t{X)\\o. (21) 

Note that we use || ■ II2 to represent both the matrix operator norm and the £2 norm of a 
vector. The exact meaning can always be inferred from the context. These singular value vector 
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representations for the matrix norms immediately lead to 

\\Xh < \\X\\f < \\X 
< 



rank(X)||X||F 
< rank(X)||X||2. 

The vectorization operator vec(X) = [ „ ] stacks the columns of X into a long 

column vector. The inner product of two matrices Xi,X2 G ]R"i^"2 defined as 

(Xi,X2) = trace(XfX2) 

= vec(Xi)^vec(X2) 

= J2iX^UX,),,. (22) 



The following Cauchy-Schwarz type inequalities are due to the fact that the dual norm of the 
Frobenius norm is the Frobenius norm, and the dual norm of the operator norm is the nuclear 
norm [3]: 

{X,,X,) < ||Xi||p||X2||f, 

{Xi,X2) < ||Xi||=|,||X2||2. 

For any linear operator A : ]R"i^"2 jjjm^ adjoint operator A* : M'" ^ ]R"iX"2 defined 
by the following relation 

{A{X),z) = {X,A*{z)) , VX G G M'". (23) 

A linear operator A : M"i^"2 ^ jjjm represented by m matrices ^ = {Ai, A2, . . . , Am} C 

^mxna as follows 



A{X) 



{A2,X) 



(24) 



{Am,X) 

We will interchangeably use A and to represent the same linear operator. The adjoint operation 
for this representation is given by 



A*iz) = gM"^ 



xn2 



(25) 



k=i 
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In the derivations of this paper, we frequently use c, C, Ci, C2, C3 to represent numerical con- 
stants. For notation simplicity, the same symbol c could be used to represent different constants 
in the argument. 

B. The Measurement Model 

The following measurement model is used throughout the paper. Suppose we have a matrix 
X G M"i^"2 ^[^Yi rank(X) = r <ti Ui. We observe y E M™ through the following linear model: 

y = A{X) + w, (26) 

where A : M"i^"2 ^ j^m jg ^ linear operator and w E M™ is the noise/disturbance vector, either 
deterministic or random. In the deterministic setting we assume boundedness: \\w\\2 < e, while 
in the stochastic setting we assume Gaussianity: w ~ J\f(0, aHm)- The model (26) is generally 
underdetermined with m <^ ^1^2, and the rank of X, r, is very small. 

A fundamental problem pertaining to model (26) is to reconstruct the low-rank matrix X 
from the measurement y by exploiting the low-rank property of X, and the stability of the 
reconstruction with respect to noise. For any reconstruction algorithm, we denote the estimate 

dcf " 

of X as X, and the error matrix H = X — X. In this paper, the stability problem aims to bound 
\\H\\p in terms of m, ni,n2,r, the linear operator A, and the noise strength e or cr^. 

C. Reconstruction Algorithms 

We briefly review three low-rank matrix recovery algorithms based on convex relaxation: the 
matrix Basis Pursuit, the matrix Dantzig selector, and the matrix LASSO estimator. A common 
theme of these algorithms is enforcing the low-rankness of solutions by penalizing large nuclear 
norms, or equivalently, the ii norms of the singular value vectors. As a relaxation of the matrix 
rank, the nuclear norm remains a measure of low-rankness while being a convex function. In fact, 
the nuclear norm || ■ ||* is the convex envelop of rank(-) on the set {X E ]R"i^"2 . ||X||2 < 1} [3, 
Theorem 2.2]. Most computational advantages of the aforementioned three algorithms result from 
the convexity of the nuclear norm. As demonstrated in Section IV, the nuclear norm enforcement 
guarantees low-rankness of the error matrix in the sense made precise in Section IV. 

The matrix Basis Pursuit algorithm [3], [5] tries to minimize the nuclear norm of solutions 
subject to the measurement constraint. It is applicable to both noiseless settings and bounded 
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noise settings with a known noise bound e. The matrix Basis Pursuit algorithm was originally 
developed for the noise-free case in [3], i.e., e = Q in (27). In this paper, we refer to both cases 
as matrix Basis Pursuit. Mathematically, the matrix Basis Pursuit solves: 

mBP : min ||ZL subject to \\y-A{Z)\\2<e. (27) 

The matrix Dantzig selector [5] reconstructs a low-rank matrix when its linear measurements 
are corrupted by unbounded noise. Its estimate for X is the solution to the nuclear norm 
regularization problem: 

mDS : min ||Z||, subject to M*(r-)||2 < A, (28) 

where r = y — A*{z) is the residual vector, a the noise standard deviation, and A„ a control 
parameter. 

The matrix LASSO estimator solves the following optimization problem [5], [25]: 

mLASSO: min -lly - ^(Z)||^ + A^IIZIL. (29) 
All three optimization problems can be solved using convex programs. 

D. Matrix Restricted Isometry Constant 

In this section, we present stability results related to matrix Basis Pursuit, the matrix Dantzig 
selector, and the matrix LASSO estimator. The aim of stability analysis is to derive error bounds 
of the solutions of these algorithms. These bounds usually involve the incoherence of the linear 
operator A, which is measured by the mRIC defined below [3], [5]: 

Definition 5 For each integer r G {1, . . . the matrix restricted isometry constant (mRIC) 
6r of a linear operator A : ]R"i^"2 ^ jgm defined as the smallest d > such that 

l_,<Mrai<l + , (30) 

ll^lll 

holds for arbitrary non-zero matrix X of rank at most r. 

A linear operator A with a small 5r roughly means that A is nearly an isometry when restricted 
onto all matrices with rank at most r. Hence, it is no surprise that the mRIC is involved in the 
stability of recovering X from A{X) corrupted by noise when X is of rank at most r. 
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The Rayleigh quotient ^^"jl^lj^^ in the definition of the mRIC motivates us to define the rank 
constrained singular values, which are closely related to the mRIC. 

Definition 6 For any integer 1 < r < rii and linear operator A : M'^i^"2 |__^ jjm^ define the 
r-rank constrained minimal singular value z/™™ and r—rank constrained maximal singular value 
u^'''' of A via 

ur{A) inf ^^Pt, and (31) 

X^O: rank(X)<r ||v^||f 

vr^{A) sup ^^P^, (32) 

X^O: rank(X)<r ||f 

respectively. 

The mRIC 5^. for linear operator A is related to the r—rank constrained minimal and maximal 
singular values by 

5r = m^x{\i-{urn\{<n'-M}- (33) 

We comment that similar to the vector RIC, the exact computation of the mRIC is extremely 
difficuk. Equation (3) implies that {X ^ : rank(X) < r} C {X ^ : t{X) < r}. As a 
consequence, the rank constrained singular values satisfy the following inequality 

^min < ^min < ^max < ^max^ (34) 

which combined with (33) yields the following relationship between mRIC and ^-constrained 
singular values: 

5r < max{|l - (pr)'l, \{prn' - 11} • (35) 

Now we cite stability results on the matrix Basis Pursuit, the matrix Dantzig selector, and the 
matrix LASSO estimator, which are expressed in terms of the mRIC. Assume X is of rank r 
and X is its estimate given by any of the three algorithms; then we have the following: 

1) matrix Basis Pursuit [5]: Suppose that < — 1 and \\w\\2 < e. The solution to the 
matrix Basis Pursuit (27) satisfies 

A— A F< ^ (36) 

" - 1 - (1 + V2)6ir 



June 22, 2010 



DRAFT 



15 

2) matrix Dantzig selector [5]: If < — 1 and ||^*(tr)||2 < A, then 

16 

||X-X||f< ^ Vr-X. (37) 

3) matrix LASSO estimator [5]: If Sir < (3V2-1)/17 and < fi/2, then the solution 
to the matrix LASSO (29) satisfies 

\\X-X\\F<C{6ir)V^-fi, (38) 

for some numerical constant C. 
Although the mRIC provides a measure quantifying the goodness of a linear operator, as 
mentioned earlier, its computation poses great challenges. In the literature, the computation 
issue is circumvented by resorting to a random argument. We cite one general result below [5]: 
• Let A : R"i^"2 j^m ^ random linear operator satisfying the concentration inequality 
for any X e M^iX"^ and < e < 1: 

P(|M(X)||2- ||X||2| >e||X|||) <Ce-™""(^\ (39) 

for fixed constant C > 0. Then, for any given S E (0, 1), there exist constants ci,C2 > 
depending only on 5 such that 5r < 6, with probability not less than 1 — Ce~^'^"^, as long 
as 

m > C2nr. (40) 

IV. Stability of Convex Relaxation based on the £1 -Constrained Minimal 

Singular Value 

In this section, we present the derivation of bounds on the reconstruction error for the matrix 
Basis Pursuit, the matrix Dantzig selector and the matrix LASSO estimator. As shown in 
Theorems 2, 3 and 4, our bounds are given in terms of the £*-CMSV rather than the mRIC 
of linear operator A. 

A. Basis Pursuit 

In this section, we establish a bound on the Frobenius norm of matrix Basis Pursuit's error 
matrix using the £*-CMSV. Recall the two steps discussed in Section II-B: 
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1) Show that the error matrix H = X — X has small rank: r{H) < 8r, which automatically 
leads to a lower bound ||^(if)||2 > psrll-f^Hr; 

2) Obtain an upper bound on ||^(if)||2. 

For matrix Basis Pursuit (27), the second step is trivial as both X and X satisfy constraint 
\\y — A{Z)\\ < e in (27). Therefore, the triangle inequality yields 

\\Am\2 = \\AiX-X)h 

< \\A{X)~yh + \\y-A{X)h 

< 2e. (41) 

In order to establish that the error matrix has a small £*-rank in the first step, we present two 
lemmas on the properties of nuclear norms derived in [3]: 

Lemma 2 [3, Lemma 2.3] Let A and B be matrices of the same dimensions. If AB^ = and 
A^B = then \\A + B\\^ = \\A\\^ + \\B\\^. 

Lemma 3 [3, Lemma 3.4] Let A and B be matrices of the same dimensions. Then there exist 
matrices Bi and B2 such that 

1) B = Bi + B2 

2) rank(5i) < 2rank(y4) 

3) AB^ = and A^ B2 = 

4) {B,,B2) = 0. 

Now we give a proof of Theorem 2: 

Proof of Theorem 2: We decompose the error matrix B = H according to Lemma 3 with 
A = X, more explicitly, we have: 

1) H = Ho + H, 

2) rank(ifo) < 2rank(X) = 2r 

3) XHj = and X^H^ = 

4) {Ho,H,) = 0. 

As observed by Recht et.al in [3] (See also [26], [5] and [16]), the fact that = + 

is the minimum among all Zs satisfying the constraint in (27) implies that ||-f^c||* cannot be very 
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large. To see this, we observe that 

||X||, > \\X + H\U 

= \\X + He + -f^oll* 
> ||X + Hc\\* — ll-f^oll* 

= ll^ll* + ll-f^cll* — ll-f^oll*- (42) 
Here, for the last equality we used Lemma 2 and XH^ = 0,X^Hc = 0. Therefore, we obtain 

ll^cll* < Ili^olU (43) 

which leads to 

1 1 1 1 * < II Hq + II He 1 1 * 



< 2||ifo||* 

< 2v/rank(f/o)||i^o||F 

= 2V2^\\H\\f, (44) 

where for the next to the last inequality we used the fact that \\H\\^ < A/rank(_f/')||if ||^, and for 
the last inequality we used the pythagoras theorem ||i^||| = ||i/o|lF + II -^c III — II -^o III because 
{Hq, He) = 0. Inequality (44) is equivalent to 

t{H) < 8 rank(X) = 8r. (45) 
It follows from (41) and Definition 2 that 

PsrWHh < \\A{H)h<2e. (46) 
Hence, we get the conclusion of Theorem 2 

||X-X||f < — . (47) 
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B. Dantzig Selector 

This subsection is devoted to the proof of Theorem 3: 

Proof of Theorem 3: Suppose X E M"i^"2 jg Qf j-^j^j^ x is the solution to the matrix 

Dantzig selector (28). Define H = X — X . We note that to obtain that H has a small i^^-mnk 
(45), we used only two conditions: 

• \\X\\^ = \\X+H\\^ is the minimum among all matrices satisfying the optimization constraint; 

• the true signal X satisfies the constraint. 

Obviously, the first condition holds simply because of the structure of the matrix Dantzig selector. 
If the noise vector w satisfies ||^(if;)||2 < A, then the true signal X also satisfy the constraint: 



Consequently, we have t{H) < 8r following the same procedure as in the Proof of Theorem 2 
in Section IV-A, or equivalently. 



We now turn to the second step to obtain an upper bound on ||^(X)||f. The condition 
||^('"^)||2 < A and the constraint in the Dantzig selector (28) yield 



A*{y-A{Xm, 



\\A*{w)h<X. 



(48) 




(49) 



\\A*{Amh < 2A 



(50) 



because 



A*{w -r) = A* ((2/ - A{X)) - {y - ^(X)) 



) 




(51) 



where r = y — A{X) is the residual corresponding to the matrix Dantzig selector solution X. 
Therefore, we obtain an upper bound on ||^(if)||p as follows: 



{A{H),Am 



{H,A*{A{H))) 



< \\HUA*{A{H))\\2 



< 2A||iJ||,. 



(52) 
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Equation (52), the definition of pgr. and equation (49) together yield 

pI\\H\\1 < {A{H),AiH)) 

< 2A||i/||, 

< 2Av^(7||iJ||F. (53) 

We conclude that 

\\H\\f < ^-v^-A, (54) 

Psr 

which is exactly the result of Theorem 3. ■ 
C. LASSO Estimator 

We derive a bound on the matrix LASSO estimator using the procedure developed in [5] (see 
also [27]). 

Proof of Theorem 4: Suppose the noise w satisfies ||^*(i«;)||2 < K/i for some small k > 0. 
Because X is a solution to (29), we have 

^\\A{X) - y\\l + < ^\\AiX) - y\\l + ft\\X\U. 



Consequently, substituting y = A{X) + w yields 



/illXil. < ^\\A{X)-y\\l~^\\A{X)-y\\l + fi\\X\ 
= l\\w\\l-^\\AiX-X)-w\\l + fi\\X\U 

= hH\l-hAx-x)\\l 



AiX - X),wj - -\\w\\i + fi\\X\ 
< (^AiX -X),w^ + fi\\X\U 
'x -X,A*{w)'j+fi\\X\U. 
Using the Cauchy-Swcharz type inequality, we get 

fi\\X\U < \\X - X\U\\A*{w)\\2 + fi\\X\U 
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which leads to 



IXIL < Atlli/IL + IIXIL. 



Therefore, similar to the argument in (42) we have 



> 
> 



^ II * II -^11 H= 

X + H\\^ — n\\H\\^.. 

X + He + Ho\\^ — K (||i^c||* + ll-^oll*) 

X + Hc\\* — ll-f^oll* — (ll-f^dl* + ll-f^oll*) 

-^11* + ll-f^dl* — ll-f^oll* — 1^ (ll-^cll* + ll-^oll*) 

X||, + (l-fi;)||ifc||* -(! + «:) Ili^oll*. 



Consequently, we have 



1-k' 



oil*; 



an inequality slightly worse than (43) for small k. Therefore, an argument similar to the one 
leading to (44) yields 



\HL < 



or equivalently. 



1 - K 



t{H) < 



2r||i/||F, 



8r 



(55) 



(56) 



Now we need to establish a bound on 



|^*(^(i^))||2 < \\A*{y-A{X))h+\\A\y-A{X))\\^ 
< \\A*{w)h + \\A*{y-A{X))h 
= ^^^^+\\A*{y-A{X))h. 



(57) 



We follow the procedure in [5] (see also [27]) to estimate ||^*(?/ — ^(X))||2. Since X is the 
solution to (29), the optimality condition yields that 



A*{y-A{X))ed\\X\l, 



(58) 
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where 9||X||* is the family of subgradient of || ■ ||* evaluated at X. According to [10], if the 
singular value decomposition of X is UTy'^ , then we have 

d\\X\l = {ii{UV^ + W) : \\Wh < 1, 

U^W = 0,WV = 0}. (59) 

As a consequence, we obtain A*{y — A{X)) = n{UV^ + W) and 

\\A*iy-AiX))h < MUV^ + W)h 

= /i. (60) 

We used \\UV'^ + W\\2 = 1 because 

max \\{UV^ + W)x\\2 

x:\\x\\2 = l 

= max \\{UV^ + W)Vy\\2<l. (61) 
v-\\vh=i 

Following the same lines in (52), we get 

\\A{H)\\l<{K + l)fi\\H\U. (62) 
Then, Equation (55), (57) and (60) 

pi^mi < \\AH)\\i 



< {n + l)fx^\\Hy. (63) 
As a consequence, the conclusion of Theorem 4 holds. ■ 

V. £i-CONSTRAINED SINGULAR VALUES OF RANDOM MEASUREMENT ENSEMBLES 

This section is devoted to analyzing the properties of the £*-CMSVs for several important 
random sensing ensembles. Although the bounds in Theorem 2, 3 and 4 have concise forms, 
they are useless if the quantity involved, pr, is zero or approaches zero for most matrices as 
rii, n2, m, k vary in a reasonable manner. We show that, at least for the isotropic and subgaussian 
ensemble, the £*-CMSVs are bounded away from zero with high probability. 
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A. Gaussian sensing matrix 

We start with the Gaussian ensembles. The derivation mimics the one for estimating maximal and 
minimal singular values for rectangular random matrix with entries from Gaussian distribution. 
The argument is due to Gordon and can be found in [28] and [29]. We make modifications to 
take the constraint into account. Due to the highly symmetric properties of Gaussian random 
variables, we expect simplified derivations for the Gaussian ensemble. 
Consider a linear operator 

A: (M^i^'^MI • II,) 

X ^ A{X) (64) 

where A is represented by a collection of m matrices £/ = {Ai, A2, . . . , Am} C M"^^"^ y^Q 
assume that the entries of matrices A^ G are i.i.d. Gaussian random variables from 

M{0, 1). Our goal is to study the behavior of the £*-CMSV for A/ y/m. We say that the operator 
A/^/rn comes from the Gaussian ensemble. 

We note that the -constrained minimal and maximal singular values can be equivalently 
expressed as 

pf^'iA) = inf sup {A{X),v), and (65) 

^max(^) = sup sup {A{X) , v) . (66) 

xgHt lies'"-! 

Here the set 

nr = {X e : ||X||f = 1, ||X||^ < r} (67) 

is a subset of the unit sphere of (M"!^"^^ || . ^nd S*™"^ is the unit sphere in i^. 

Clearly, the quantity {A{X),v) is a Gaussian random variable for fixed X E Hr and v e 
^m-i ^ consequence, the problem of estimating p^™{A) and p^^'^{A) reduces to the study of 
a Gaussian process ^x,v indexed by x S*™^^, especially the behavior of its extremal values. 
Suitable tools for this purpose are Gaussian comparison theorems, in particular the Gordon's 
inequality and the Slepian's inequality. Under certain conditions on the second order moments 
of the increments, these two inequalities compare the expected extremal values of a Gaussian 
process with another Gaussian process, one that is simpler for analysis purposes. 
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Lemma 4 [22, Chapter 3.1] Suppose {iu,v)u(^u,v&v cind {Cu,v)u&,v<^v be Gaussian processes with 
zero mean. If for all u,u' E U and v,v' eV, 



then 



If 



'^{.iu,v — iu',v'Y < '^{.Cu,v — Cu',v'Y, (68) 



Slepian's inequality: E sup ^u,v < E sup (u,v (69) 

ueu,vev ueu,vev 



E(^n,^, - ^u',v'Y < ^iCu,v " Cu' , if U ^ u' 

mu,v - ^u,v'f = HCu,v - Cu,v'?, (70) 



then 



Gordon's inequality: E inf supX^^, > E infsupF„^t,. (71) 

Consider only one index set and assume the index is time. The Slepian's inequality states that, 
the Gaussian process with the bigger step size measured by the second order moments of the 
increments will have a larger maximal distance away from the origin in its life. The Gordon's 
inequality can be understood in a similar manner. 

To the end of analyzing p™™(^) and p"'^'^{A), define two Gaussian processes 

= Yl XeUr^ve S"'-\ (72) 

i<ni k<m 
j<n2 

Cx,. = {G,X) + {h,v) 
= ^ GijX,, + J2 hkVk, XeHr,ve S'^-\ (73) 

i<n\ k<m 
j<n2 

where ^ is a Gaussian linear operator, and G is a Gaussian matrix and h is a Gaussian vector. 

Clearly, 'E^u,v = ^Cu,v = 0. We will focus on the application of Gordon's lemma to study p™™. 
The analysis of p™^'' using Slepian's lemma follows a similar and actually simpler argument. 
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Algebraic manipulations using the fact ||^||f = 1^ ll^lh = 1 and Holder's inequality show 
that 

i,j k 

= E{Cx,v-Cx'^v'Y, (74) 
Furthermore, setting X = X' immediately gives 

k 

= nCx,.-Cxyf. (75) 
As a consequence, Gordon's inequality (71) implies that 

^p^'^^A) = E inf sup ^x,v 

> E inf sup Cx,v (76) 
To compute the right hand side of the previous formula, we proceed as 

E inf sup Cx,v 
= E inf (G,X)+E sup {h,v) 

> -E sup IIGIIsllXll* +E \\h\\2 

xe-Hr 

= -v^ E||G||2 + E||^||2 

> -3v^v^+v^. (77) 
We need some extra effort for the last inequality. Recall the convention rii < n2 and note that 

E IIGIIs < v^< 27^, (78) 



where for (78) we used an upper bound for the expected largest singular value of a rectangular 
Gaussian matrix [28], [29], and equation (79) comes from an explicit computation involving 
distribution. It suffices to show that 

3v^-E||G||2 > E||/i||2-v^. (80) 
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To this end, define g ~ A/'(0, I^J and assume m > n2- Since ^/rn — \/2T{{m + l)/2)/r(m/2) 
is decreasing in m, we have 

^/m — IE||h,||2 < — IE||gf||2 

< 3v/ri2 - 2v/ri^ 

< 3v/^-E||G||2. (81) 

Consequently, inequality (77) holds under that condition that m>n2. 
Therefore, Gordon's lemma yields for the numerical constant c = 3 

Epr (4=) = inf \\A{X)\\, > 1 - c^l^, (82) 

as long as m > 722- Note that the bound (82) is non-trivial only if m > c^Tn2 since otherwise 
we can use as a lower bound. So we can drop the condition m > 71.2. 

Similarly, using Slepian's lemma, we conclude a similar bound for the largest singular value: 

Ep-- (^) = ^E sup \\A{X)\\, < 1 + c.[^. (83) 



Now we have shown that p™"^ {A/ y/m) and p^^^ {A/ \/rn) are close to one in average. What 
we really want is to show that the typical values of p™™(^/v^) and p™^"" (^/-y/m), rather 
than their averages, are close to one. Put in another way, we need to exclude the possibility 
of the "lottery phenomenon" [30] where an event with small probability contributes most of 
the average. To this end, we show that the largest and smallest constrained singular values of 
A are 1-Lipschitz functions. By the concentration of measure phenomena, nice functions {e.g. 
Lipschitz functions) that depend on many parameters are almost constant. As a consequence, 
pmm ^j^^ ^max ^quM conccutratc their values around their means. 

Note a function F defined on a metric space {X, d) is called L— Lipschitz if \F{x) — F{x')\ < 
Ld(x, x') holds for all x, x' E X. We have the following lemma: 

Lemma 5 The i^-constmined singular values p™^''(-) and p™"(-) are 1— Lipschitz in (]R™x"iX'^2^ (pj 
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when the metric d{A, B) is defined as 

/ m \ 1/2 

d{AB)=\yy\\Au-Bk\\l\ . (84) 

More precisely, we have 

|pr"(^) -Pr"(^)l < d{A,B) 

|prM)-pr(i3)| < d{A,B). (85) 

Proof: Note that 

pr^(^)= sup \\A{X)\\, 

= sup \\B{X) + {A-B){X)\\, 

< sup \\B{X)\\+ sup \\{A-B){X)\\, 
x&Ht xe-Hr 



1/2 

< pr^(i3)+ sup (^|((A,-i?,),X)|2l 

V tit 

> 1/2 

< pr"(S)+ sup {T\\Ak-Bk\\UX\ 
= pr"(i3) + ci(Ai3). (86) 



|2 



Due to the symmetric role of A and B, we conclude that 

|pr'(^) -pr"(i3)| < d{A,B). (87) 
Using similar arguments, we establish 

|pf"(^) -P^"'"(i3)| < d{A,B). (88) 

■ 

By the functional form of concentration of measure in Gauss space, we have 

p [pr'(^) - ^pT^iA) > t] 

< exp(-t72), (89) 
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and 

P [pf "(^) - Epf "(^) < -t] 
= P[pf"(^) < v^-Cv/?^-t] 
< exp(-tV2). 
Therefore, replacing t with ^/mt we have 



P 



rn2 



m 



Tn2 



m 



< exp(-mtV2). 



If we set 



m > 4c — — 
e 

t = -, 



with e G (0, 1) we have 



P [pr"(^/v^) > 1 + e] < exp(-meV8), 
P [pf'^iA/yM) < 1 - e] < exp(-meV8). 



(90) 



(91) 



(92) 



(93) 

Therefore, for the Gaussian ensemble, the -constrained singular values of A/ ^/m concentrate 
around 1 with large probability. 



B. Sub-Gaussian sensing matrix 

In this section, we establish results for isotropic and subgaussian measurement operators 
that are comparable to those for the Gaussian ensemble presented in the previous section. The 
techniques used in Section V-A, for example, the Slepian's and Gordon's inequalities, and the 
measure of concentration phenomenon in Gauss space, depend heavily on the highly symmetric 
property of Gaussian random variables and can not be extended to the subgaussian case trivially. 
In this section, we employ a recent estimate on the behavior of empirical processes involving 
subgaussian random variables. Before going into details of this result established in [23], we first 
formulate the i^—CMSW for subgaussian measurement operators and discuss the difficulties. 
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Suppose the linear operator A is generated in a way such that E||^(X)||2 = m||X||| for any 
X e M"i^"2_ We note that p^^^(^/v^) < 1 + e and p'^^\A/^/m) > 1 - e are equivalent with 



sup 



sup 

X&Hr 



-A{XfA{X) - 1 
m 

m 

m < ^ 



fc=l 



< e. 



(94) 



As usual, the operator A is represented by a collection of matrices = {Ai, . . . ,Am}- We 
define a class of functions parameterized hy X as J-'r '= {fx{-) = {X, ■) : X G Tir}- Recall that 

Hr = {X e M"^^"^ : ||X||f = 1, \\X\\l < r}. (95) 

Denote Pm the empirical measure that puts equal mass at each of the m random observations 
Ai, . . . , Am, i-s.. 



^ m 

1=1 



(96) 



with Sa{-) the dirac measure on M'^i^"2 (-j^^j pyjg mass at A. We realize that Zllii (^j' -^)^} 
is nothing but the empirical process {Prn{P)} f^^^. We slightly abuse notation and use E/^ to 
denote E/^(y4). Then, our goal is to estimate 



and 



Esup \PUf) 



pjsup |P„(/2)-E/2| 



(97) 



(98) 



a central topic of the study of empirical processes. 

Another slightly different (but closely related) way to view (94) is to consider 



dcf 



X 



^ m 

m ^-^ 



X e Ur 



(99) 



as a random process. Then E,^x can be computed using, for example, Dudley's inequality [22], 
[24], which involves the entropy number or covering number of "H,- [31], [32]. This idea is 
used to establish bounds on the minimal and maximal singular values for rectangular random 
matrices. There are several challenges associated with this approach for estimating the t^- 
CMSV of general isotropic and subgaussian operators. First, a good estimate of the entropy 
number of is not readily available. Second, even if the elements of the sensing operator are 
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subgaussian random variables, the increments \^x —^z\ are actually a mixture of subexponential 
and subgaussian random variables, instead of being pure subgaussian. This causes problem when 
the Dudley's inequality, or the better generic chaining bound presented later, is applied. In 
addition, a corresponding concentration of measure result for subgaussian distributions is also 
absent. 

Before we turn to the general empirical process result of [23] developed by delicate use of 
the powerful generic chaining idea, we need some notations and definitions. A key concept 
in studying general Gaussian processes as well as the empirical process {-Pm(/^)}/ej'r is the 
7p functional we are going to define. We need some setup first. For any set X, an admissible 
sequence is a sequence of increasing partitions {Qk}k>Q of X such that card(Qo) = 1 and 
card(Qfc) = 2^*° for A; > 1. By a sequence of increasing partitions, we mean that every set in 
is contained in some set of Qk+i- We will use Qk{X) to denote the unique set in the partition 
Qk that contains X E X. The diameter of Qk{X) is denoted by A((5/c(X)). Then we have the 
following definition for 7p functional associated with a metric space: 

Definition 7 Suppose {X , d) is a metric space and p > 0. We define 

^p{X,d) = inf sup5^2^/PA(gfc(X)), (100) 

^^'^ k>0 

where the infimum is taken over all admissible sequences. 



The importance of the 7p functional lies in its relationship with the behavior of a Gaussian 
process indexed by a metric space when the metric coincides with the one induced by the 
Gaussian process. More precisely, suppose {^x}xex is a Gaussian process indexed by the metric 
space {X,d) with 

d{x,z) = mx-^zry^\ m) 

then we have 

C72(A', d)<E sup < C72(A', d) (102) 

for some numerical constants c and C. The upper bound was first established by Fernique [33] 
and the lower bound is obtained by Talagrand using majorizing measures [34]. The rather difficult 
concept of majorizing measures has been considerably simplified through the notion of "generic 
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chaining", an idea that dates back to Kolmogorov and is greatly advanced in recently years by 
Talagrand [24]. The upper bound (generic chaining bound) 

Esnp^x <C^2{X,d) (103) 

is actually applicable as long as the increments of {^x}xex have subgaussian tails: 

n\^x -^z\>t}< cexp (^- 2d^x Zy ) ' ^ ° ^^^^^ 
for d{X, Z) defined in (101) and 

E^x = 0, VX e X. (105) 

Under the conditions (104) and (105), an immediate consequence of the generic chaining bound 
is the well-known Dudley's inequality [22], [24] 



Esupex<C5^2^/2efc(A'), (106) 

^^-^ k>0 

or equivalently in the more familiar integral form 



oo 



E sup < C / VlogiV(A',d,e), (107) 

X£X Jo 

where efc(A') and N{X, d, e) are the entropy number and covering number [31], [32], respectively. 
In general the generic chaining bound (103) is tighter than the Dudley's entropy bounds (106) 
and (107). 

We can not apply the generic chaining bound (103) directly to the random process (99) because 
this random process does not have increments with subgaussian tails, even if the entries of Ai 
in (99) are independent subgaussian. (The zero mean condition is a minor issue.) As a matter of 
fact, the increments of {^x}xgx defined in (99) is a mixture of subgaussian and subexponential: 

m.-(z\ > t] < 2exp (-min (-^. 

\/x,z eX, t> 0. (108) 

In this case, the generic chaining bound becomes [24] 

E sup < C(7i(;f , d) + 72(A', d)). (109) 
xex 

The Dudley's entropy bounds (106) and (107) also need to be modified to include an additional 
term corresponding to the subexponential tail. We remark that the 7i(A:', d) term and its counter 
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parts in Dudley's entropy inequality present major challenges of applying these bounds to our 
£^,-CMSV problem. We need to circumvent this term in some way. 

One major contribution of [23] is to show that the ^i(X,d) term in (109) is not necessary 
for the empirical process {PmiP)} feTr - particular, the expectation Esupjgjr^ \Pm{P) — 
behaves as if the underlying process is subgaussian even if it is a mixture of subgaussian and 
subexponential. One key condition is, as commented by the authors of [23], all functions in 
have the same second order moments. We now present the empirical process result: 

Theorem 5 Let A ^ M."' be a random vector which induces a measure /i on M", and T he 
a subset of the unit sphere o/L2(M",/i) with diam(J^, || ■ H^J = a. Then there exist absolute 
constants ci,C2, c-s such that for any e > and m > 1 satisfying 

„>^^£M£lLM. (110) 



with probability at least 1 — exp(— C2e^m/a^), 



sup 



m 



l-J2f\A,)-Ef\A) 



m 

k=l 



<e. (Ill) 



Furthermore, if is symmetric, we have 



E sup 



-. m 

-Y^f\A,)-EfiA) 



m 

k=l 



12{J^, II ■ IU2) 7|(-^, II • 11^2. 



'm m 



As we remarked before, the key advantage of this theorem is that the results only involve 
72 ('^, II ■ 1^2)- We now develop a proof of Theorem 1 based on Theorem 5. 

Proof of Theorem 1: Consider the function set = = {/x(-) = i^,-) '■ W^Wf = 
I5 ll^ll* ^ t}- Assume A G ]R"i^"2 ^j^^j vec{A) is isotropic and subgaussian with constant L. 
As a consequence of the isotropy of vec{A) and ||vec(X)||2 = ||^||f = 1, we get Jv is a subset 
of the unit sphere of L2(M"^"^, /i). The symmetry of yields 

a = diam(Jv, || • H^^) 

= 2 sup II {X,A) 11^2 

< 2 sup L||X||f 

= 2L. (113) 
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Now the key is to compute 72(-7v, || ■ H^a)- Due to (102), the problem reduces to computing 
EsnpxeHr (actually an upper bound suffices), where {Cxjxenr is the canonical Gaussian 
process on X: 

Cx = {G,X), vec{G)^^{0,ln,n,),X eUr. (114) 

Clearly, we have 

72('^T, II ■ IIV2) < c E sup {G,X) 

< c ||X||, E IIGII2 

< cv^y^. (115) 
As a consequence, the conclusions of Theorem 1 hold. ■ 

VI. Conclusions 

In this paper, the -constrained minimal singular value of a measurement operator, which 
measures the invertibility of the measurement operator restricted to matrices with small £*-rank, 
is proposed to quantify the stability of low-rank matrix reconstruction. The reconstruction errors 
of the matrix Basis Pursuit, the matrix Dantzig selector, and the matrix LASSO estimator are 
concisely bounded using the £*-CMSV. Using a generic chaining bound for empirical processes, 
we demonstrate that the £*-CMSV is bounded away from zero with high probability for the 
subgaussian measurement ensembles, as long as the number of measurements is relatively large. 

In the future work, we will study the feasibility of using the ^*-CMSV to bound the reconstruc- 
tion error of iterative algorithms. More importantly, we will also design algorithms to efficiently 
compute the £*-CMSV and use the £*-CMSV as a basis for designing optimal sensing operators. 
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